Teaching Linear Models, Machine Learning, and Deep Learning

The position was located at SupAgro, the Institut Agro of Montpellier, from January 2025 to August 2025.

The main challenges were related to teaching and pedagogy in data science for agronomy engineers (master's level). Secondary challenges included strengthening the laboratory's skills in deep learning practices and methodologies, as well as implementing technical solutions to facilitate the transition between the R language, widely used in statistical research, and Python. An application of these methods on near-infrared spectroscopy (NIR) data in 1-D was also planned.

I was involved within the MISTEA laboratory, in the Data Manager UE3 program at SupAgro.

Tasks & Objectives

My role included several responsibilities: teaching statistics, linear regression, and spatial statistics to students lacking prerequisites to follow the general course, creating educational materials in data science, particularly around deep learning, with application on NIR spectrum data, and documenting, creating packages and methods to facilitate the transition between R and Python.

The objective was to provide the "same tools" as those offered by the R environment for creating resources, and to simplify the translation, when possible, of RMarkdown resources to Jupyter Notebooks (HTML, PDF generation).

The objectives to achieve were: ensuring courses and follow-up for linear regression and geospatial subjects in support, creating educational materials (regression and classification practice) for deep learning (and PLS (Partial Least Squares) and Random Forest Machine Learning algorithms), and upskilling the MISTEA laboratory on the use of Python and LLMs (Large Language Models).

The success criteria were: ensuring courses, producing reusable educational materials, improving understanding and use of Python within the laboratory, and illustrating how the use of LLMs can help scientists advance more quickly, at least on the code part.

Actions and Development

I ensured the courses and animated the two subjects I was present for. I created two detailed Jupyter Notebooks applied to NIR data, including code organization and development best practices. I also created a methodology and package to facilitate the transition from RMarkdown to Jupyter Notebook (in Python).

Part of this work consisted of code and a command-line interface (CLI) for using Jupyter Notebooks in .py (not .ipynb), to facilitate the use of LLMs for assisted translation tasks. The other part concerned the configuration and customization of the jupyter nbconvert CLI to generate resources.

The steps followed included contacting several local research teams specialized in the subject, implementing the middle-range Machine Learning algorithms (PLS, Random Forest, and SVM), and developing deep learning models with PyTorch, ranging from perceptron to more complex models like CNN (Convolutional Neural Networks). Then, I ensured the teachings for a little over a month. Currently, I am finalizing the educational materials and preparing the methodology and package to present them to the scientific community of the laboratory during seminars, with a deliverable.

I worked under the pedagogical supervision of Bénédicte Fontez and Philippe Vismara. For statistics, I provided support for Meili Baragatti. I had exchanges with several laboratories specialized in NIR data, including the UMR AMAP of IRD (Institut de Recherche pour le Développement), and a laboratory specialized in using deep learning methods on this data at CIRAD with Gregory Beurier and Denis Cornet. I also had other internal interactions on campus.

The challenges encountered included the fact that, although close to data science for many years (participation in research/work groups and courses taught in reinforcement learning at Epitech), I had not had the opportunity to deepen the subject of deep learning on real problems. The statistics courses allowed me to review knowledge I had not actively used for several years.

For the main work around deep learning, I started with a relatively naive approach, beginning with standard Machine Learning algorithms on NIR data. I then tried to be as methodical and impartial as possible, placing the model results on the data at the heart of decisions. In 1D, it is not at all obvious that deep learning models give significantly better results than "traditional" algorithms, not to mention their explainability.

Results

The educational resources provided will be reused next year, and the interactions on pedagogical and scientific levels were very positive. It is a bit early to really measure the pedagogical impact, as this is only the second year of this deep learning course, and the first where I participate.

However, we can notice the influence of using generative AI for code rendering, and how students take it in hand: they can let themselves be completely guided without mastery, which is not good, or quickly take the reins to take a step back on the scientific content.

As often in teaching, you get as much, if not more, than you give. Here, I am extremely happy to have had the opportunity to practice data science on a medium-term mission. This allowed me to gain a lot of perspective and practice on deep learning, to refresh and clarify the concepts and algorithms of statistics, and to practice using LLMs in the context of data science (as opposed to web application development, in my case).

Technical Stack

The project relies on the following tools and technologies:

  • Deep Learning : PyTorch
  • Machine Learning : scikit-learn
  • Data Processing : NumPy, Pandas
  • Visualization : Matplotlib, Seaborn
  • Development Environment : Jupyter Notebooks

It is important to note that these tools were chosen because they represent the state of the art for deep learning.